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Abstract — Two-player zero-sum repeated games are well 
understood. Computing the value of such a game is 
straightforward. Additionally, if the payoffs are dependent 
on a random state of the game known to one, both, or 
neither of the players, the resulting value of the game has 
been analyzed under the framework of Bayesian games. 
This investigation considers the optimal performance in a 
game when a helper is transmitting state information to 
one of the players. 

Encoding information for an adversarial setting (game) 
requires a different result than rate-distortion theory 
provides. Game theory has accentuated the importance 
of randomization (mixed strategy), which does not find a 
significant role in most communication modems and source 
coding codecs. Higher rates of communication, used in 
the right way, allow the message to include the necessary 
random component useful in games. 

I. Introduction 

Two-player zero-sum games play a fundamental role 
in game theory because their analysis is straightforward. 
The min-max theorem of von Neumann [1] establishes a 
unique value of the game. Figure [U shows the payoff 
matrix for a basic game. Since neither of the pure 
strategies for Player A dominates the other, a mixed 
strategy is optimal. Player A will choose action with 
probability 1/4 and action 1 with probability 3/4. By 
doing this, the expected payoff for Player A is 3/4 no 
matter which strategy Player B chooses. 

In a Bayesian game, the payoff matrix, which we 
represent by n(a, b) over the domain of pure strategies 
a and b, is a random quantity. We can model this as a 
game with a random state S that determines the payoffs, 
referred to in the literature as the "type." The value 
of such a game depends on the information known to 
the players. If the neither player knows S (aside from 
its distribution), then the value of the game can be 
derived from the expected value of the payoff matrix. 
Additionally, if both players know S then the value can 
be derived from the payoff matrix associated with every 
instance of S and averaged. More interesting cases occur 
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Fig. 1. Example of two-player zero-sum game. The entries in the 
matrix represent the payoff that Player A receives, which is negative 
of the payoff for Player B. To maximize the minimum expected 
payoff, Player A chooses action with probability 1/4 and action 
1 with probability 3/4. This results in an expected payoff of 3/4 for 
Player A. 



when only one player knows S or when both players 
have incomplete information about S. 

Consider a situation where different functions of the 
state S are known to both of the players of the game. 
These functions can be represented by a partition over 
the support of S known as the information structure, as 
illustrated in Figure [2] It has been established that games 
of this form can be solved by expanding the space of 
pure strategies to include strategies that depend on the 
available information. The min-max theorem still holds. 
Samples of inquiries into the value of information can 
be found in the following pubUcations: IS, 131, H, S. 
While information can only increase a players optimal 
score in a game, not all information structures are equal, 
even if they contain the same quantity of information. 
For a given resolution, what is the optimal information 
structure? 

In this work we consider a repeated game setting 
where the state of the game S is i.i.d. and known non- 
causally to a helper who is assisting one of the players 
of the game. The helper can communicate with a rate 
limit of R bits per iteration of the game. The opposing 
player may or may not know S but is certainly aware 



Distribution of state S 




Quantization into bins 



Fig. 2. Example of information structure. This model for discussing 
partial state ("type") information in a Bayesian game uses a partition 
over the domain of the state to represent the "information structure." 
A player of the game observes only a discrete function of the state 
indicating the cell that the state belongs to. Each player of the game 
may have a different information structure. 
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Fig. 3. Erasure game. This game has two states that are equally 
likely. Player A must play A = S or A — e with probability 1. 
Therefore, if Player A does not know the state, ^4 = e is the only 
choice. 

of the conspiracy against him. Even the protocol for 
communication is known (or learned) by the opposing 
player. We establish information theoretic lower-bounds 
to the optimal performance. We show through examples 
that the intuition provided by rate-distortion theory can 
be misleading in this setting. 

II. Example - Erasure Game 

We illustrate in Figure [3] a two-player game with a 
binary, equally distributed state. Player A has three pure 
strategies, 0, e, and 1. Player B only has two strategies, 
0, and 1. The matrix values represent the payoff Player A 
receives for any given state S and pair of actions A and 
B (pure strategies). As this is a zero-sum game. Player B 
receives negative of the payoff of Player A. We concern 
ourselves with average payoff, averaged with respect to 
probabilities if mixed strategies are involved. 

Notice that Player A will at all costs avoid choosing a 
pure strategy of A = 1 when the state is 5 = 0, or vice 
versa, hence the label "erasure game.'Q So if Player A is 

'The results in this paper actually require the payoffs to be finite. 
Here we use — oo casually. The same consequences would result from 
a finite but greatly negative payoff in place of — oo. 



TABLE I 

Value of erasure game under varying information 
structures 





Value 


Neither knows S 


1/2 


Both know S 


3/4 


A knows S 


3/2 


B knows S 






ignorant of the state S, he has no option but to choose e 
with probabiUty one. We can therefore deduce the value 
of the game for two of the four information structures in 
Table in If Player B does not know the state of the game, 
then on average A wiU get a payoff of 1/2. If Player B 
does know the state S then he will choose the strategy 
B = S, resulting in a payoff of zero. 

On the other hand, if Player A knows the state S 
while Player B does not, then equilibrium will occur with 
Player A choosing A = S and scoring 3/2 on average. 
Finally, we can consider the case where they both know 
the state. As we observed in the game of Figure [T] Player 
A will choose the mixed strategy consisting of A = S 
with probability 1/4 and A = e with probability 3/4. 
Player B will choose B = S with probability 1/4 and 
B = 1 — S with probability 3/4. The resulting average 
payoff is 3/4. 

In the "erasure game" the state is binary, so these four 
information structures mentioned exhaust all determin- 
istic information structures. No intriguing optimization 
problem presents itself in only one iteration of the game. 
For example, we can't ask, "what is the best information 
structure for Player A that has cardinality two?" There 
is only one choice of information structure. Fortunately, 
a more graceful spectrum of information structures is 
available with vector quantization. 

III. Shortcomings of Deterministic Encoding 

Consider a repeated-game setting of the erasure game 
where a helper observes the state of the game and com- 
municates to Player A over a rate-limited channel. Both 
players know all past actions and states. Additionally, 
the helper, and possibly Player B, observe the state 
completely and non-causally. The helper and Player A 
select a block length n and arrange a protocol by which 
the helper will send nR bits describing the state to Player 
A. Player A will then play the game for n iterations. 
Player B has full knowledge of the protocol but does 
not actually see the message. We ask for the maximum 
average payoff that can be achieved for a give rate R. 



In other words, what is the max-min value of this game 
as a function of R7 

We begin with a simple case. As Table U shows, 
if Player A and Player B both know the state S, the 
value of the game is 3/4. But what if Player A only 
learns of the state through communication from a helper, 
while Player B observes the state directly? What rate of 
communication is needed for Player A to still achieve an 
average payoff of 3/4? Recall that the uniquely optimal 
strategy for Player A is to choose A = S with probability 
1/4 and A = e with probability 3/4. 

We can use rate-distortion theory as inspiration for an- 
swering this question. Rate-distortion theory prescribes 
a formula for finding the minimum description rate 
needed to reconstruct a sequence of observations with 
limited average distortion. The procedure is to find the 
correlation of the source and reconstruction that satisfies 
the distortion constraint and results in the lowest mutual 
information. By describing a random source at a rate 
greater than the mutual information, it is possible to 
assure with high probability that the reproduction will 
have the desired correlation with the source as measured 
by first order statistics. 

In the case of the erasure game where Player B knows 
the state S, suppose we decide to encode the state for 
Player A using a rate-distortion-like code and an erasure 
test-channel such that the action sequence results in 
S = A roughly 1/4 of the time and S = e roughly 
3/4 of the time. The rate required is R > I{S; A) = 1/4 
bits per iteration. Unfortunately, this will not result in 
a good strategy for Player A. The encoding schemes 
that arise in rate-distortion theory are deterministic. The 
action sequence Ai, ...,An is a deterministic function of 
the state sequence S'i,...,5„. Since Player B observes 
the state sequence, he will deduce the actions of Player 
A and anticipate them every time. Player A is effectively 
not playing a mixed strategy. The resulting payoff will 
be 0. The insufficiency of rate-distortion-like codes is a 
concern even when Player B does not know the state. 
After watching the actions of Player A for roughly 
k = j^^n iterations it will be possible to deduce the 
entire action sequence. To a clever opponent who does 
not know the state, the actions will appear random and 
appropriately distributed for the beginning ^ < — e 
portion of the block, for large enough n (related to results 
from f6\), and later in the block, when ^ > j^j^ + e> 
the opponent will be able to decode the sequence with 
high probability and anticipate every action. The bottom 
line is that rate-distortion-like codes place no emphasis 
on producing random actions. 
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Fig. 4. Encoding diagram. This illustrates a method of encoding 
a state sequence 5*" to allow an action sequence A" to be random 
even to an observer of the state. An auxiliary variable U is chosen 
which separates S and A into a Markov chain, and a codebook of 
U" sequences is agreed upon. The state sequence 5*" is a random 
realization. The encoder observes 5*" and randomly chooses among 
jointly typical U" sequences from the codebook. After sending the 
index of the U" sequence to the decoder, an action sequence is 
generated randomly conditioned on U". If the codebook is populated 
densely enough, the actions A" will be memoryless and appropriately 
correlated with the state sequence S". The required density of the 
codebook is a topic visited in (6), (7j, (§], and j9|. 

The work in [6| prescribes an encoding scheme for 
generating correlated random variables. The minimum 
description rate for the state sequence Si, S„ needed 
to produce a sequence of actions Ai,...,An with a 
distribution that is arbitrary close in total variation to the 
desired mixed strategy is Wyner's common information 
C{S;A), defined in [7|. The resulting encoding scheme 
for producing this "strong coordination" [8] between the 
state S and the action A uses randomized encoding 
and randomized decoding. Figure |4] illustrates that the 
encoder uses the message to specify the index of a 
sequence C/i, C/„ from a predefined codebook, just as 
in rate-distortion-like codes, but here the sequences do 
not represent reconstruction sequences. After the decoder 
identifies the sequence f/i, ...,[/„ he produces the actions 
Ai,...,An randomly as the output of a memoryless 
channel from U to A. In this way the codebook is 
separated from the action sequence Ai, ...,An, allowing 
more randomness to be injected into the actions. 

The common information for a binary erasure channel 
can be found in Q. Thus, the minimum description rate 
needed for a helper to describe the state S to Player A in 
order to achieve the optimal average payoff in the erasure 
game when Player B knows the state is i? > C{S; A) = 
H{l/4), where H{-) is the binary entropy function. 

IV. Stationary Mixed Strategy not Necessary 

Just as the information structure in a Bayesian game 
defines the set of pure strategies for each player, the 
description rate of the state that a helper is allowed 
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Fig. 5. Degenerate game. This game has two states that are equally 
likely. Player B does not have a choice of alternative strategies; there- 
fore, from Player A's point of view this is not really an adversarial 
setting. The payoff of the game depends only on the probability with 
which A = S. The rate-value tradeoff of this degenerate game can 
be cast as a rate-distortion problem. Randomized actions for Player 
A are unnecessary. 



to use also defines the set of block strategies allowed, 
which may have structure and memory from one game 
iteration to the next. We know from implications in [6| 
that the set of achievable stationary strategies is the set 
of conditional distribution that yield C{S\ A) < R. 
However, stationary strategies are not the only strategies 
worth considering. 

A degenerate Bayesian game is represented by the 
payoff matrices in Figure |5] In this game Player B has 
only one pure strategy. Player A is not really playing 
against an opponent but is simply maximizing the aver- 
age value of a function of the state S and action A. In 
fact, the payoffs in this game are simply the negative 
Hamming distortion between S and A; therefore, the 
rate-value tradeoff for this game is a canonical example 
from rate-distortion theory. A payoff P is achievable at 
rate Rif R>1- H{-P) for P > -1/2, where H{-) 
is the binary entropy function. 

In this degenerate Bayesian game the optimal strategy 
has structure and predictability. There is no adversary to 
compete with, so producing a random sequence of ac- 
tions is unnecessary. These degenerate games simplify to 
rate-distortion problems and poignantly highlight situa- 
tions where fully generating correlated random variables 
is not necessary. 

V. The Search for a Graceful Transition 

The state of a Bayesian game should be encoded by 
a helper in such a way that allows a random mixed 
strategy to be correlated with the state, even if that 
strategy is not entirely unpredictable for the duration 
of the communication block. One way to achieve this 
is to follow the encoding procedure used to generate 
correlated random variables, as in Figure |4l but use a 
smaller codebook. 



Specifically, a codebook is constructed by first choos- 
ing a conditional distribution Pu,A\s = Pu\sPa\u ™d 
generating 2"^ sequences n"(i) independently for each 
i and i.i.d. according to pu. The encoder chooses ran- 
domly (according to the appropriate distribution pre- 
scribed in |16j) from all sequences u"'{i) that are jointly 
typical with the state and sends the index / to the 
decoder. The decoder then constructs the sequence 
from the index / and synthesizes a memoryless chan- 
nel according to pa\u to produce an action sequence 
Ai,...,An. If the rate R > I{U;S,A) then this would 
produce a memoryless strategy, meaning that the oppo- 
nent could not use observations of the past actions and 
states to infer about future actions. However, by letting 
R < I{U ; S, A) we also allow for situations where 
memoryless strategies are not of the essence. 

In order for the encoder to find at least one jointly 
typical sequence in the codebook with high prob- 

ability, a rate requirement of i? > I{S; U) is necessary. 
Beyond that, any excess rate will serve to randomize 
the actions for a portion of the encoding block. At first 
the actions will appear random and correlated with the 
state. After observing a designated fraction of the block, 
the opponent will be able to deduce the index of the 
message using a channel decoder, revealing the future 
of the n"(i) sequence, from which the future actions 
will be randomly generated. The transition from Player 
B knowing nothing about the next action A to the point 
where Player B can decode n"(i) with high probability 
becomes sharper as the block length n grows. 

Let a be the transition threshold such that for the first 
k iterations where k < {a — e)n the actions are random 
according to the mixed strategy pa\s and for the final 
k iterations where k > {a + e)n the sequence u'^{i) is 
known with high probability, in the limit as n is large. 
Then for the case where Player B does not know the 
state sequence, a = ^) ■ For the case where Player 

B knows the state sequence, a = ^j^rj^- 

Let us designate some notation to represent the score 
Player A achieves in a game under different settings. 
Let represent the minimum average payoff Player 

A receives by playing a strategy p^^s when Player 
B does not know the state S. When Player B does 
know the state, we use a superscript to indicate this 

and represent the minimum average payoff for Player 

(S) 

A by IIp^is. Additionally, when an auxiliary random 
variable U is involved in constructing the action A such 
that Pu,A\s = Pu\sPa\u ^^'^ Player B knows U, we 
represent the minimum average payoff for Player B as 



or IIp^'i^'', depending on whether or not Player 
B also knows the state. Each of these values can be 
computed from the payoff matrix 11: 
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Theorem 5.1 (Lower bound on rate-value tradeoff): 
If Player B does not know the state sequence, an 
average payoff P for Player A is achievable with a 
state-information description rate of R if 



PU,A\S = PU\SPA\U 



such that 
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If Player B knows the state sequence, an average payoff 
P for Player A is achievable with a state-information 
description rate of R if 

3{U,A) ~ = such that 
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The lower bound of Theorem 15.11 accommodates set- 
tings where full randomization is needed, such as optimal 
play in the erasure game when Player B knows the state, 
and it also accommodates efficient communication for 
degenerate games where the payoff does not change 
when the opponent learns about the action A. It is 
not clear, however, whether or not this gives the whole 
tradeoff for sub-optimal play in games where the com- 
munication limits do not allow for ideal randomization. 
For example, what is the value of the erasure game when 
Player B knows the state sequence and R < H{l/A), 
where H{-) is the binary entropy function? 

If the bound in Theorem 15.11 is not tight, a couple 
of ideas come to mind for improvement. One is to use 



an encoding method that is not stationary but moves 
from one strategy to another as the opponent learns. In 
game settings, time-sharing must be analyzed carefully. 
The performance depends on whether the time-sharing 
is interleaved or not. A related idea is to use a layered 
encoding scheme, so that the opponent learns the mes- 
sage a little bit at a time. A possible layered approach 
follows. 

The helper who observes the state can send a mes- 
sage in layers to Player A by selecting two auxiliary 
random variables Ui and U2 and an action A such that 
S — {Ui,U2) — A form a Markov chain. A codebook 
of u" sequences of size 2"'^^^^^'^^'^'^'^ is generated from 
the i.i.d. distribution, and for each of these sequences 
a codebook of U2 sequences of size 2'^(^~^(^i''^)^"^) is 
generated conditioned on n". We let e be small. The 
encoder first finds a u" sequence that is appropriately 
correlated with the state sequence and then chooses 
randomly from the sequences in the conditional 
codebook that are appropriately correlated with both 
and /S". The decoder constructs and U2 from 
the message and synthesizes a memoryless channel to 
generate the action sequence A" from and u!^. 

Using this encoding scheme, the block will be divided 
into three sections partitioned by ai and a2, where 
the knowledge of Player B transitions sharply at these 
thresholds in the limit as n becomes large. For the first 
k iterations where k < (ai — e)n the actions are random 
according to the mixed strategy pa\s- For the middle k 
iterations where (ai + e)n < k < (02 — e)n the opponent 
knows Ui, which is correlated with A, and for the final k 
iterations where k > [02 + e)n the opponent knows both 
Ui and U2- In the setting where Player B does not know 
the state sequence, ai = ^[g.^ff) and 02 = jg7jSi?!) 



as long as ai < a2 < 1- In the setting where Player B 

-R-/(t/i,C/2;5' 
I{U2;A\UuS) 



knows the state sequence, ai = and 02 ~ ^ i{Ui,U-2,S) 



as long as a2 < lH 

We omit the explicit lower bound that is obtained from 
this encoding scheme. It is a straightforward derivation 
and provides little additional insight. However, if a 
layered scheme proves to provide improvement over 
Theorem 15.11 then the idea of adding additional layers 
with additional auxiliary variables comes to mind. Each 
additional variable would introduce a new phase of 
performance in the game. This quickly becomes cumber- 
some. Perhaps instead there is a smooth way of adjusting 

'In the case where Player B does not know the state, if ai > 02 
then this layered encoding scheme does not provide any benefit over 
Theorem 15. II And with either assumption about Player B, if Q2 > 1, 
the encoding scheme can be modified to increase Qi. 



the strategy as the game proceeds and the opponent infers 
more about the communication. 

VI. Summary 

We have considered partial state information in a 
Bayesian game from an optimization perspective. If a 
limited amount of state information is passed by a helper 
to one of the players in a two player zero-sum repeated 
Bayesian game, how much can it increase the value 
of the game? The description of the state can be used 
to correlate the actions in the game with the state. 
In some settings, mutual information is the description 
rate needed to adequately correlate behavior. But in 
the adversarial setting of games, a rate-distortion-like 
compression of the state information at a rate equal to the 
mutual information results in behavior that is predictable 
by the opponent. On the other hand, Wyner's common 
information has been shown to be the description rate 
needed to fully correlate actions in a completely unpre- 
dictable way. However, this can be more than necessary. 

We introduce a communication scheme that performs 
efficiently (Theorem 15.11 ) in the two extreme cases, 
where memoryless randomization is essential and where 
it is irrelevant. Although the communication is based on 
i.i.d. codebooks, the performance in the game changes 
dramatically mid-way through the communication block 
as the opponent infers the compressed message. The non- 
stationary performance of a stationary encoding scheme 
introduces new challenges in the quest for efficient 
compression. 
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